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Abstract 

At least two recent developments have put the spotlight on some 
significant gaps in the theory of multivariate time series. The recent 
interest in the dynamics of networks; and the advent, across a range 
of applications, of measuring modalities that operate on different tem¬ 
poral scales. 

Fundamental to the description of network dynamics is the direc¬ 
tion of interaction between nodes, accompanied by a measure of the 
strength of such interactions. Granger causality (GC) and its asso¬ 
ciated frequency domain strength measures (GEMs) (due to Geweke) 
provide a framework for the formulation and analysis of these issues. 
In pursuing this setup, three significant unresolved issues emerge. 

Firstly computing GEMs involves computing submodels of vector 
time series models, for which reliable methods do not exist; Secondly 
the impact of filtering on GEMs has never been definitively established. 
Thirdly the impact of downsampling on GEMs has never been estab¬ 
lished. In this work, using state space methods, we resolve all these 
issues and illustrate the results with some simulations. Our discus¬ 
sion is motivated by some problems in (fMRI) brain imaging but is of 
general applicability. 


1 Introduction 

Following the operational development of the notion of causality by [18] and 
|41| . Granger causality (henceforth denoted GC) analysis has become an 
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important part of time series and econometric testing and inference e.g. [20] , 
It has also been applied in the biosciences, m, 12], IHI; climatology (global 
warming) [33|, [28], |l5]; and most recently functional magnetic resonance 
imaging (fMRI). 

Since its introduction into fMRI m m it has become the subject 
of an intense debate: e.g. see [38] and associated commentary on that 
paper. There are two main issues in that debate but which occur more 
widely in dynamic networks. Firstly, the impact of downsampling on GC. 
In the fMRI neuro-imaging application causal processes may operate on a 
time-scale of order tens of milli-seconds whereas the recorded signals are 
only available on a one-second time-scale. So it is natural to wonder if GC 
analysis on a slow time-scale can reveal dynamics on a much faster time- 
scale. Secondly, the impact of filtering on GC due to the hemodynamic 
response function which relates the neural activity to the recorded fMRI 
signal. Since intuitively GC will be sensitive to time delay, the variability 
of the hemodynamic response function, particularly spatially varying time 
to onset and time to peak (confusingly called delay in the fMRI literature) 
has been suggested as a potential source of problems [8].[2T]. 

An important advance in GC theory and tools was made by [H] who 
provided measures of the strength of causality (henceforth called GEM for 
Geweke causality measure) including frequency domain decompositions of 
them. Subsequently it was pointed out that the GEMs are measures of mu¬ 
tual information |36j . The GEMs were extended to conditional causality 
in m- However GEMs have not found as wide application as they should 
have, partly because of some technical difficulties in calculating them dis¬ 
cussed further below. But GEMs (and their frequency domain versions) are 
precisely the tool needed to pursue both the GC downsampling and filtering 
questions. 

In the econometric literature, it was appreciated early that downsam¬ 
pling, especially in the presence of aggregation could cause problems. This 
was implicit in work of [ID] ; mentioned also in work of [6] who gave an 
example of contradictory causal analysis based on monthly versus quaterly 
data and also discussed in [33] . But precise general conditions under which 
problems do and do not arise have never been given. We do so below. 

Some of the above econometric discussion is framed in terms of sampling 
of continuous time models m, [33],[6]. And authors such as m have sug¬ 
gested that models are best formulated initially in continuous time. While 
this is a view the author has long shared we deal with only discrete time 
models here. To cast our development in terms of continuous time models 
would require a considerable development of its own without changing our 
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basic message. 

The issue at stake, in its simplest form, is the following. Suppose that a 
pair of (possibly vector) processes posses a unidirectional GC relation but 
suppose measurements are only available at a slower time-scale on filtered 
series. Then two questions arise. The first, which we call the forward ques¬ 
tion, is this: Is the unidirectional Granger causal relation preserved? The 
second, which we call the reverse question, is harder. Suppose the down- 
sampled filtered series exhibit a uni-directional GG relation; does that mean 
the underlying unfiltered faster time-scale processes do? The latter question 
is the more important and so far has received no theoretical attention. 

In order to resolve these issues we need to develop some theory and some 
computational/modeling tools. Firstly to compute GEMs one needs to be 
able to find submodels from a larger (i.e. one having more time series) model. 
Thus to compute the GEMs between time series xt, yt mm attempted to 
avoid this by fitting submodels separately to xt to yt and then also fitting 
a joint model to xt,yt- Unfortunately this can generate negative values for 
some of the frequency domain GEMs [5]. Properly computing submodels 
will resolve this problem and previous work has not accomplished this (we 
discuss the attempts in m and [5] below). 

Secondly one needs to be able to compute how models transform when 
downsampled. This has only been done in special cases |34] or by methods 
that are not computationally realistic. We provide computationally reliable, 
state space based methods for doing this here. 

Thirdly we need to study the effect of filtering on GEMs. And then using 
these tools one can compute filtered downsampled GEMs and hence study 
the effect of sampling and filtering on GEMs. 

To sum up we can say that previous discussions including those above 
as well as [I3],[11],11Q],I33] fail to provide general algorithms for finding 
submodels or models induced by downsampling. Indeed both these problems 
have remained open problems in multivariate time series in their own right 
for several decades and we resolve them here. Further there does not seem 
to have been any theoretical discussion of the effect of filtering on GEMs and 
we resolve that here also. To do that it turns out that state space models 
provide the proper framework. 

Throughout this work we deal with the dynamic interaction between two 
vector time series. It is well known that if there is a third vector time series 
involved in the dynamics but not accounted for then spurious causality can 
occur for reasons that have nothing to do with downsampling. This situation 
has been discussed by [25]; see also [T5]. Other causes of spurious causality 
such as observation noise are also not discussed. Of course the impact of 
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downsampling in the presence of a third (vector) variable is also of interest 
but will be pursued elsewhere. 

Finally our whole discussion is carried out in the framework of linear 
time series models. It is of great interest to pursue nonlinear versions of 
these issues but that will be a major task. 

The remainder of the paper is organized as follows. In section 2 we review 
and modify some state space results important for system identification or 
model fitting and needed in the following sections. In section 3 we develop 
state space methods methods for computing submodels of innovations state 
space models. In section 4 we develop methods for transforming state space 
models under downsampling. In section 5 we review GC and GEMs and 
extend them to a state space setting. In section 6 we study the effect of 
filtering on GC via frequency domain GEMs. In section 7 we give theory to 
explain when causality is preserved under downsampling. In section 8 we 
discuss the reverse problem showing how spurious causality can be induced 
by downsampling. Conclusions are in section 9. There are three appendices. 

1.1 Acronyms and Notation 

GC is Granger causality or Granger causes. We use the GC designator 
alone where we make statements of interest in both weak and strong cases, 
dn-gc is does not Granger cause; GEM is Geweke causality measure; SS is 
state space or state space model; ISS is innovations state space model; VAR 
is vector autoregression; VARMA is vector autoregressive moving average 
process; wpl is with probability 1. 

denotes the values Xa, Xa+i, ■'' > ^b', so A“ = Xa- For stationary 
processes we have a = —oo. z~^ = L is the lag or backshift operator; LHS 
denotes left hand side etc. If M, N are positive semi-definite matrices then 
M > N means M — N is positive semi-definite. A square matrix is stable if 
all its eigenvalues have modulus < 1. 

2 State Space 

The computational methods we develop rely on state space techniques and 
spectral factorization. 

There is an intimate relation between the steady state Kalman filter 
and spectral factorization which is fundamental to our computational pro¬ 
cedures. 

So in this section we review and modify some basic results in state space 
theory, Kalman filtering and spectral factorization. 
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In the sequel we deal with two vector time series, which we collect to¬ 
gether as, zt = {xj. 

2.1 State Space Models 

We consider a general constant parameter SS model, 

+ Wt, Zt = C^t + Vt ( 2 - 1 ) 

with positive semi-definite noise covariance, var(^p = ^). We refer to 

this as a SS model with parameters (A,C,[Q,R,S]). 

It is common with SS models to take S = 0, but for equivalence be¬ 
tween the class of VARMA models and the class of state space models it is 
necessary to allow S' / 0. 

Now by matrix partitioning, |(^ ^)| = |ii||(5s|,Qs = Q — SR~^S^. So 
introduce. 

Noise Condition N. R is positive definite, 
whereupon Qg is positive semi-definite. 

2.2 Steady State Kalman Filter, Innovations State Space 
(ISS) Models and the Discrete Algebraic Ricatti Equa¬ 
tion (DARE) 

We now recall the Kalman filter for mean square estimation of the unob¬ 
served state sequence from the observed time series zt- It is given by 
[26] (Theorem 9.2.1), 

It+i = A^t + Ktet, et = Zt- C^t, or zt = C^t + et 

where et is the innovations sequence of variance Vt = R + CPtC^ and Kt = 
{APtC^ + S)Vi^ is the Kalman gain sequence and Pt is the state error 
variance matrix generated from the Ricatti equation, Pt+i = APtA^ + Q — 
KtVtKj. 

The Kalman filter is a time-varying filter but we are interested in its 
steady state. If there is a steady state i.e. Pj —)• P as t oo then then 
the limiting state error variance matrix P will obey the so-called discrete 
algebraic Ricatti equation (DARE) 

P = APA^ + Q- KVK'^, (2.2) 

where V = R + CPC^ and K = (APC^ + S)V-^ is the corresponding 
steady state Kalman gain. With some clever algebra [26] (section 9.5.1) the 
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DARE can be rewritten (the Ricatti equation can be similarly rewritten), 

P = AsPA^ + Qs- KsVK^ 

where Ag = A — SR~^C and Kg = AgPC^V~^. 

We now introduce two assumptions. 

1 

Stabilizability Condition St: As,Qs is stabilizable (see Appendix A) 
Detectability Condition De: Ag^C is detectable. 

In Appendix A it is shown this is equivalent to A, C being detectable. And 
also it holds automatically if A is stable. 

The resulting steady state Kalman filter can be written as, 

^4+1 = A^t + K^ti Zt = C^t + (2-3) 

where et is the steady state innovation process and has variance matrix V 
and Kalman gain K. This steady state filter provides a new state space 
representation of the data sequence. We refer to it as an innovations state 
space (ISS) model with parameters {A,C, K,V). We summarize this in, 
Result I. Given the SS model (12.ip with parameters {A,C, [(3,R, S']), then 
provided N,St,De hold: 

(a) The corresponding ISS model (|2.3I) with parameters {A, C, K, V) can 
be found by solving the DARE (12.2p which has a unique positive definite 
solution P. 

(b) V is positive definite, {A, C) is detectable and A — KC is stable so 
that (A, K) is controllable. 

Proof. See appendix A. 

Remarks. 

(i) Henceforth an ISS model with parameters {A,C, K,V) will be re¬ 
quired to have V positive definite, {A, C) detectable and (A, K) controllable 
so that A — KC is stable. 

(ii) It is well known that any VARMA model can be represented as an 
ISS model and vice versa [32].[22]. 

(hi) Note that the ISS model with parameters {A,C,K,V) can also be 
written as the SS model with parameters (A, C, [KVK"^, V, KV]). 

(iv) The DARE is a quadratic matrix equation but can be computed 
using the (numerically reliable) DARE command in matlab as follows. Com¬ 
pute: [P, Lo, G] = DARE{A^, , Q, R, S, I) and then, V = R+CPC^, K = 
G^. 

(v) Note that stationarity is not required for this result. 
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2.3 Stationarity and Spectral Factorization 

Given an ISS model with parameters we now introduce, 

Condition Ev: A has all eigenvalues with modulus < 1 i.e. ^ is a stability 
matrix. 

With this assumption we can obtain an infinite vector moving average (VMA) 
representation, an infinite vector autoregressive (VAR) representation and a 
spectral factorization. The following result is based on [26] [Theorem 8.3.2] 
and surrounding discussion. 

Result II. For the ISS model {A, C, B, Sg) obeying condition Ev we have, 

(a) Infinite VMA or Wold decomposition, 

zt = H{L)et = {C{L-^I - A)-^B + I)et = {C{I - AL)-^BL + I)et (2.4) 


(b) Infinite VAR representation, 

Ci = G{L)zt = [I + C{L-^I - A + KC)-^K]zt (2.5) 


(c) Spectral factorization. Put L = exp(—jA) then, zt has positive definite 
spectrum with spectral factorization as follows. 


■ Q S' 

' {LI - A'^y^G'^' 

S'^ R 

I 


fz{X) = [C{L-^I-A)-\l] ^ = R(L)S,R^'(L-1) 

( 2 . 6 ) 

(d) H{L) is minimum phase i.e. its inverse exists and is causal and stable. 


Proof, (a). Just write (12.311 in operator form. The series is convergent wpl 
and in mean square since A is stable. 

(b) . Rewrite (|2.3p as, it+i = (A - KC)it + Kzt,et = zt - C^f Then 
write this in operator form. The series is convergent wpl and in mean square 
since A — KC is stable and zt is stationary. 

(c) . Follows from standard formulae for spectra of filtered stationary 
time series applied to (a). 

(d) . From (a),(b) G{L) = and by (b) G{L) is causal and stable 

and the result follows. 


Remarks. 

(i) For further discussion of minimum phase filters see mm- 

(ii) Result II is a special case of a general result that given a full rank 
multivariate spectrum /z(A) there exists a unique causal stable minimum 
phase spectral factor H{L) with H{0) = I and positive definite innovations 
variance matrix such that (I2.6|l holds mm- In general detH{L) may 
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have some roots on the unit circle |23j.|19j but the assumptions in result II 
rule this case out. Such roots mean that some linear combinations of zt can 
be perfectly predicted from the past [23],[19] something that is not realistic 
in the fMRI application. 

(iii) Result II is also crucial from a system identification or model fitting 
point of view. From that point of view all we can know (from second order 
statistics) is the spectrum and so if, naturally, we want a unique model, the 
only model we can obtain is the causal stable minimum phase model i.e. 
the ISS model. The standard approach to SS model fitting is the so-called 
state space subspace method mm and indeed it delivers an ISS model. 
The alternative approach of fitting a VARMA model mm is equivalent 
to getting an ISS model. 

(iv) We need result I however since when we form submodels we do not 
immediately get an ISS model, rather we must compute it. 

3 Submodels 

Our computation of causality measures requires that we compute induced 
submodels. In this section we show how to obtain a ISS submodel from the 
ISS joint model. 

Now we partition zt = {xt,yt)'^ into subvector signals of interest and par¬ 
tition the state space model correspondingly, C* = (^^) and B = {Bx, By)- 
We first read out a SS submodel for xt from the ISS model for zt- We have 
simply ^t+i = Mt + wt, xt = Cx^t + where, wt = B{l^’l ) = Bxex,t + 

Byey^f We need to calculate the covariance matrix, ^). We 

find, R = Tix,ei Q = var{wt) = BTi^B^ and, S = E{wte^^) = B{^^^ ) = 
Bq. This leads to 

Theorem I. Given the joint ISS model (|2.3I) or (|2.4I) for zt, then under con¬ 
dition Ev, the corresponding ISS submodel for xt namely (A, Gx, A'(x)) ^x) 
(the bracket notation K(^x) is used to avoid confusion with e.g. Cx,Ex,e 
which are submatrices) can be found by solving the DARE (12.2|) with [Q, R, 
Sx,e, Bo]. 

Proof. Eirstly we note by partitioning I Eel = |Sx,e| |T(y|x),e| where S(y|x),e = 
^y,e - ^YX,e'^x\ ExY,e SO that Sx,e and S(y|x),e are both positive defi¬ 
nite. Now we need only check conditions N,St,De of result I. We need 
to show, R = Sx,e is positive definite,(A, Cx) is detectable and (A — 
SR ^Cx,{BTi^B"^ — SR ^R )2 is stabilizable; in fact we show it is con¬ 
trollable. 
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The first is already established. The second follows trivially since A is 
stable. We use the PBH test (see Appendix A) to check the third. 

Suppose controllability fails, then by the PBH test, there exists g / 0 
with Aq^ = q^(A - ^~^Cx) and 0 = q^{B^,B'^ - ^ 0 = 

= q^{B^,B^ - = 

q^iBx,By)C, 

= q'^BY^(Y\x),eBY ^ 0 = q^BY^(Y\X),eBYq ^|| B^q 11 = 0 ^ q^By = 0 
since T,(Y\x),e is positive dehnite. But then, = q"^{A—{Bx, By){^^^ )T,p/Jx) 

q^{A—{B xCx+By'Byx,J^PjJx) = (F{A—BxCx)- Thus (A—HxCx, By) 
is not controllable. But this is a contradiction since we can find a matrix 
,namely Cy so that A — BxCx — ByCy = A — BC is stable. 

Remarks. 

(i) For implementation in matlab positive definiteness in constructing Q 

can be an issue. A simple resolution is to use a Cholesky factorization of 
Eg = and form Bg = BLg and then form Q = BgBj. 

(ii) The P(x) matrix from DARE, 

[B(x),Bo,G] = DARE{A'^,Cx,BT,eB'^,T,x,e,Bo,I) obeys B(x) = AB(x)A^+ 
BEeB"^-K( xFxKJx) and then K(^x) = {AP{x)Cx + Bo)^p^^x = Ex,e + 
CxPixFl- 

(iii) m discuss a method for obtaining submodels but it is flawed. 

Firstly it requires the computation of the inverse of the VAR operator. 

While this might be feasible (analytically) on a toy example, there is no 
known numerically reliable way to do this in general (computation of de¬ 
terminants is notoriously ill-conditioned). Secondly it requires the solution 
of simultaneous quadratic autocovariance equations to determine VMA pa¬ 
rameters for which no algorithm is given. In fact these are precisely the 
equations required for a spectral factorization of a VMA process. There do 
exist reliable algorithms for doing this but given the flaw already revealed 
we need not discuss this approach any further. 

Next we state an important corollary: 

Corollary I. Any submodel is in general a VARMA model not a VAR. To 
put it another way the class of VARMA models is closed under the forming 
of submodels whereas the class of VAR models is not. 

This means that VAR models are not generic and is a strong argument 
against their use. Any vector time series can be regarded as a submodel of 
a larger dimensional time series and thus must in general obey a VARMA 
model. This result (which is well known in time series folk lore) is significant 
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for econometrics where VAR models are in widespread use. 

For the next section we need, 

Theorem II. For the joint ISS model (12.3p or (12.41) for zt with conditions 
St,De holding and with induced submodel for xt given in Theorem I, we 
have, 


fx{\) = hx{exp{-j\))VLxh^{exp{j\)) (3.1) 

hx{L) = I + Cx{L-^I-A)-^Ki^x)=I + LCx{I-AL)-^K(x) 

/ 7T 

H/x(A)|- 

4 Downsampling 

There are two approaches to the problem of finding the model obeyed by a 
downsampled process; frequency domain and time domain. While the the 
general formula for the spectrum of a sampled process has long been known, 
it is not straightforward to use and has not yielded any general computa¬ 
tional approach to finding submodels of parameterized spectra. Otherwise 
the most complete (time domain) work seems to be that of [34] who only 
treat the hrst and second order scalar cases. There is work in the engineer¬ 
ing literature for systems with observed inputs but that is also limited and 
in any case not helpful here. We follow a SS route. 

We begin with the ISS model ()2.3p . Suppose we downsample the ob¬ 
served signal Zt with sampling multiple m. Let t denote the fine time scale 
and k the coarse time scale so t = mk. The downsampled signal is = Zmt- 
To develop the SS model for 'zk we iterate the SS model above to obtain 

6+z = A^it + s' 

Now set t = mk, I = m and denote sampled signals, = z^kAk = 

€mk- Then we find. 


Wk , Zk — C^k + Sfc 

where Wk = Btkm+i-i- We now use result I to find the ISS model 

corresponding to this SS model. 

We have hrst to calculate the model covariances. 


^i^k^'k) — Sf; — i? 

E{wkel) = A™-iRS, = 5™ 
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(4.1) 


) — Qm 

Qm = A^{A^Y (4.2) 

^ Qm = AQm-iA'^ + BT.^B^,m>2,Qi = BT.^B^ 

We now obtain, 

Theorem III. Given the ISS model (12.3p . then under condition E, for m > 

1, the ISS model for the downsampled process Zk = Zmt is {A'^,C, K^,VY) 
obtained by solving the DARE with SS model {A'^,C,[Qm, R, Sm]) where 
Qm is given in (14.2p and Sm is given in (14.ip . 

Proof. Using result I we need to show the following. R is positive definite, 

{A^,C) is detectable and (A™' — SmR~^C), {Qm — SmR~^Sm)^) is stabi- 
lizable; in fact we show controllability. The first holds trivially; the second 
also since A is stable and thus so is A™. For the third we use the PBH test. 

Suppose controllability fails. Then there is a left eigenvector q (possibly 
complex) with Xq'^ = q'^{A^ — SmR~^C) = q'^A^~^{A — BC) and q^{Qm — 
SmR~^SlY = 0 = q^YYf-^A^BT.^B'^{A^Y ^ TYf-^q^^Y~^A^Bi:,B'^{A^Yq = 

0. Since is positive definite this delivers || B'^{A'^Yq |p= 0 ^ 

B'^{A'^Yq = 0 for r = 0, • • • , m — 2. 

Using this, we now find, Xq^ = q"^A^~^{A — BC) = q"^A^~^{A — BCY + 
qTj\m-2 bc{A — BC) = q^A^~‘^{A — BCY. Iterating this yields, = 
q^{A — BC)"^. Thus if Am is an m-th root of A then Xm<l^ = q^{A — BC). 

Since also q"^B = 0 we thus conclude (A — BC,B) is not controllable. But 
this is a contradiction since, (A — BC) + BC = A is stable. 

Remarks. 

(i) In matlab we would compute, [Pm, Lq, Cm] = DARE((A™')'^, C'^, Qm, X^e, Sm,I), 

yielding + CP^C'^ and Aim = G^. 

(ii) More specifically Pm > 1) obeys, Pm = Am Rm^m + Q m 

RmVYKL 

where Am = A™ - SmX^J^C = A™-i(A - BC); Qm = Qm - SmX^J^Sl = 

Qm-i; VY = X^. + CPf^C^; K*^ = AmP;,C^{VY)-^. 

5 Granger Causality 

In this section we review and extend some basic results in Granger causality. 

In particular we extend GEMs to the state space setting and show how to 
compute them reliably. 

Since the development of Granger causality it has become clear [ID] . m 
that in general one cannot address the causality issue with only one step 
ahead measures as commonly used; one needs to look at causality over all 
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forecast horizons. However one step measures are sufficient when one is 
considering only two vector time series as we are [inj [Proposition 2.3]. 

5.1 Granger Causality Definitions 

Our definitions of one step Granger causality naturally draw on m,mmm 
but are also influenced by [3], who, drawing on work of [35], distinguished 
between weak and strong GC or what Gaines calls weak and strong feedback 
free processes. We introduce: 

Condition WSS: The vector time series xt, yt are jointly second order sta¬ 
tionary. 

Definition: Weak Granger Cansality. 

Under WSS, we say yt does not weakly Granger cause (dn-wgc) xt if, for all 
t 

mr(W+i|Xioo,^-oo) = var{Xt+i\X^_^) 

Otherwise we say yt weakly Granger causes (wgc) xt- 

Because of the elementary identity, var{X\Z) = E[var{X\Z, W)]+var[E{X\Z, W)] 
E[var{X\Z, W)]+E[{E{X\Z,W)-E{X\Z)){E{X\Z,W)-E{X\Z)f\Z] the 
equality of variance matrices in the dehnition also ensures the equality of 
predictions, E{Xt+i\Xl^,Y^_^) = E{Xt+i\Xl^). 

This definition agrees with |18].[3] who do not use the designator weak 

and mm who do. 

Definition: Strong Granger Causality. 

Under WSS, we say yt does not strongly Granger cause (dn-sgc) xt if, for 
all t, 

var{Xt+t\Xt^,Yl^,Yt+i) = uar(W+i|Xi^) 

Otherwise we say yt strongly Granger causes (sgc) xt- Again equality of the 
variance matrices ensures equality of predictions, E{Xt+i\X^^,Yl^,Yt^i) = 
E{Xt+t\XL^). 

This dehnition agrees with [3| and |42] . 

Definition: FBI. Feedback Interconnected. 

If Xt Granger causes yt and yt Granger causes xt then we say xt,yt are 
feedback interconnected. 

Definition: UGC. Unidirectionally Granger Gauses. 

If Xt Granger causes yt but yt dn-gc xt we say xt unidirectionally Granger 
causes yt- 
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5.2 Granger Causality for Stationary State Space Models 


Now we partition zt = {xt,yt)^ into subvector signals of interest and parti¬ 
tion the vector MA or state space model (12.4p correspondingly, 


Xt 

Vt 


Hxx{L) Hxy{L) 
Hyx{L) Hyy{L) 


[1 + 


{L-^I-A)-\Bx By) 


Cx 
Cy 

Hxx{L) Hxy{L) 
_Hyx{L) Hyy{L)\ [eyt 





. _ 


var 


ex,t 


^XY,e 



.Syx,e Sye 


ex,i 


(5.1) 

(5.2) 


Cx{L-^I-A)-^Bx + I Cx{L-H-A)-^By 
Cy{L-^I - A)-^Bx Cy{L-^I - A)-^By + I 


Now we recall results of [3]: 

Result III: If Zt = (*() obeys a Wold model of the form zt = Hz{L)et 
where Hz{L) is a one-sided square summable moving average polynomial 
with Hz{^) = I which is partitioned as in (15.2p then: 

(a) yt dn-wgc xt iff Hxy{L) = 0. 

(b) yt dn-sgc Xt iff Hxy{L) = 0 and = 0. 

We can now state a new SS version of this result: 

Theorem IV. For the stationary ISS model (|5.1I5.2I) : 

(a) yt dn-wgc xt iff CxA'^By = 0, r > 0. 

(b) yt dn-sgc xt iff CxA^By = 0, r > 0 and Sxye = 0. 

Proof. Follows immediately from result III since Frxy(.b) = Ti'^CxA^B yL"^^^ ■ 


Remarks. 

(i) By the Cayley Hamilton Theorem we can replace (a) with: CxA^By = 

0,0<r<n — l,n = dim[f^t)- 

(ii) Collecting these equations together gives Cx{By, ABy, • • •, A'^~^By) = 
0 which says that the pair {A, By) is not controllable. Also we have, 

By(Cj, • • •, (A^)”'“^C'J) = 0 which says that the pair (Cx,A) is 

not observable. Thus the representation of Hxy{L) is not minimal. 

From a data analysis point of view we need to embed this result in a 
well behaved hypothesis test. Results of m, suitably modified, allow us to 
do this. 


5.3 Geweke Causality Measures for SS Models 

Although much of the discussion in m is in terms of VARs we can show 
it applies more generally. We begin as |14] did with the following defi- 
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nitions. Firstly, Fy^x = ^ measure of the gain in using the 

past of y to predict x beyond using just the past of x] similarly intro- 

measure, 


duce Fx^y = In ■ Next define the instantaneous influence 
I 115 ^ I 

Fy.x = In ' I ’’’ • These are then joined in the fundamental decomposi¬ 
tion [T4] . 

FxoY = Fy^x + Fx^y + Fy.x (5.3) 

where, FxoY = [H] then proceeds to decompose these measures 

in the frequency domain. Thus the frequency domain GEM for the dynamic 
influence of yt on xt is given by in, 

l/x(A)| 


Fy^x = 


and /e(A) is assembled (following [T3]) as follows. 

Introduce W = T.yx and note that ey^t — Wex,t is uncorrelated 

with ex,t and has variance E(y|x),e = — T.YX,e^x^^^XY,e- Then rewrite 

(15.2p as 


'xt' 


Hxx{L) Hxy{L) 

■ / o' 

I o' 

^x,t 

.yt. 


.Hyx{L) Hyy(L) 

WI 

_-WI 

. ^y,t. 


Hxx{L) -I- Hxy{L)W, Hxy{L) ex,t 

Hyx{L) + HYYiL)W, Hyy{L) eY,t -Wex,t 

This corresponds to (3.3) in [14] and yields the following expressions corre¬ 
sponding to those in HU. 

fxiL) = UL) + HxY{m^Y\x),eH]^YiL-^) (5.5) 

fe{L) = Hex{mx,eHlx{L-^) (5.6) 

Hex{L) = Hxx{L) + Hxy{L)W 

Using the SS expressions above we rewrite Hex{L) in a form more suited to 
computation as, 

Hex{L) = [Cx{L-^I-A)-^Bo + I] 

Bo = Bx + BYY.ly^j:-\ = B{^^n^x\ 

Eyx,e 

Note that then, using Theorem II, 

Jn|/x(A)|-- J Jn\U\)\- 


(5.7) 


= ln\Qx\ — ln\'Ex,e\ = In 


|Ex,.| 


(5.8) 
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Clearly, with L = exp{-jX), fx{L) > fe{L) ^ Fy^x > 0. 

Also the instantaneous causality measure is, 

Fy.x 

^{X\Y),e 

Clearly > 5](x|y),e so that -Fy.x > 0. 

Introduce the normalised cross covariance based matrix, Tx,y = Sy ^ Syx,eS^^^Sxy^eSy ^. 
Then using a well known partitioned matrix determinant formula |32] we find 
Fy.x = ln\I — 'Xx^y\- This means that the instantaneous causality measure 
depends only on the canonical correlations (which are the eigenvalues of 
^x,y) between ex,t,ey,t, [39],[29]. 

To implement these formulae, we need expressions for Qx,^y, fxW- 
To get them [TJ] fits separate models to each of xt and yt- But this causes 
positivity problems with fy^xW [S]- Instead we obtain the required quan¬ 
tities from the correct submodel obtained in the previous section. We have. 

Theorem Va. The OEMs can be obtained from the joint ISS model (j5.1h 
and the submodel in Theorem II, as follows, 

(a) Fy^x = where fix is got from the submodel in Theorem II. 

(b) The frequency domain GEM /y^x(A) (15.4p can be computed from 

(15S1),(13I]),(15Z3). 

And fly,-Fx^y,/x^>y (A) can be obtained similarly. 

Now pulling all this together with the help of result III we have an 
extension of the results of m to the state space/VARMA case. 

Theorem Vb: For the joint ISS model (15.ip . 

(a) Fy^x > 0, Fy.x > 0 and Fy^x + Ey.x = j • 

(b) yt dn-wgc xt iff, fx{L) = fe{L) which holds iff Fy^x = 0 i.e. iff 
^X = Ex,e- 

(c) yt dn-sgc xt iff fx{L) = fe{L) and E(x|y),e = Ex,e i.e. iff Fy^x = 0 
and Fy.x = 0 i.e. iff Fy^x + Fy.x = 0 i.e. iff fix = ^(x|y),e- 

Remarks. 

(i) A very nice nested hypothesis testing explanation of the decomposi¬ 
tion ()5.3p is given by Parzen in the discussion to |14j . 

(ii) It is straightforward to see that the GEMs are unaffected by scaling 
of the variables. This is a problem for other GC measures |12j . 

(hi) For completeness we state extensions of the inferential results in HD 
without proof. Suppose we fit a SS model to data zt, t = 1, • • • , T using e.g. 
so-called state space subspace methods or VARMA methods in e.g. 


= In 


|Sx,.||Ey,, 


= In- 




S(x|y),6ll5^ 


= In-, 


|Sx,. 


Tel 


-(X|y),6l 


(5.9) 


= Ex,e - Sxy,eEy^^^Syx, 


15 








[3T] . Let Fy^x, Fx^y, Fy.x, FxoY be the corresponding GEM estimators. 
If we denote true values with a superscript 0, we find under some regularity 
conditions: 


Hq : Fy^x = 0 TFy^x => as T ^ oo 

and Ho : Fyx = 0 ^ TFyx => Xp^py 

So to test for strong GC we put these together, 

Ho : F^^x = 0> ^YX = 0 ^ T{Fy^x + Fyx) => X(2n+l)p,py 

Together with similar asymptotics for Fx^y, FxoY we see that the funda¬ 
mental decompositon (15.3p has a sample version involving a decomposition 
of a chi-squared into sums of smaller chi-squared statistics. 

(iv) [5] attempts also to derive Fy^x without fitting separate models 
to xt,yt- However the proposed procedure to compute fxW involves a two 
sided filter and is thus in error. The only way to get /x(A) is by spectral 
factorization (which produces one-sided or causal filters) as we have done. 

(v) Other kinds of causality measures have emerged in the literature 
e.g. HZ] but it is not known whether they obey the properties in theorems 
IVa,IVb. However these properties are crucial to our subsequent analysis. 

6 Effect of Filtering on Granger Causality Mea¬ 
sures 

Now the import of the frequency domain GEM becomes apparent since it 
allows us to determine the effect of one-sided (or causal) filtering on GC. 

We need to be clear on the situation envisaged here. The unfiltered time 
series are the underlying series of interest but we only have access to the 
filtered time series. So we can only find the GEMs from the spectrum of the 
filtered time series. What we need to know is when those filtered GEMs 
are the same as the underlying GEMs. We have, 

Theorem VI. Suppose we filter zt with a stable, full rank, one-sided filter 

ML) = *4m)then, 

(a) If d>(L) is minimum phase then the GEMs (and so GC) are unaffected 
by filtering. 

(b) If ^(L) has the form <h(L) = '0(L)$(L) where '4>{L) is a scalar all 
pass filter and ^{L) is stable, minimum phase then the GEMs (and so GC) 
are unaffected by filtering. 
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(c) If ^{L) is nonminimum phase and case (b) does not hold then the 
GEMs (and so GC) are changed by filtering. 

Proof. Denote zt = ^{L)zt = ^{L)H{L)et by Result 11(a). Then for the 


frequency domain GEM we need to find, fy^xW — 


\fxW\ 




where L = exp(—jA). We find trivially that. |/x(A)| = |<I>x(R)/x(A)<hx(E~^)| = 
\^x{L)\\fx{^)\\^x{L~^)\- Finding H^x{L) is much more complicated; we 
need the minimum phase vector moving average or state space model cor¬ 
responding to (|5.2I1 . Taking ^{L) to be non-minimum phase we carry out a 
spectral factorization, fzW = H{L)'EH'^{L~^) where H{L) is causal, sta¬ 
ble, minimum phase with H{Q) = I and then from appendix C, H[L) can be 
written, H{L) = ^{L)H{L)D{L-^),D{L-^) = where E{L) 

is all pass and J, J are constant matrices (Cholesky factors). Writing this 
in partitioned form. 


H{L) 


^x{L) 0 

0 <I>y(L) 


Hxx{L)Hxy{L) Dxx{L ^)Dxy{L ^) 
Hyx{L) Hyy{L) Dyx{L-^) Dyy{L-^) 


yields = ^x{L)K^x{L) where, 

^exi^) = Hxx{L){Dxx{L~^) + DxY{L~^)TjYX,e^x]j 
+ HxY{L){DYY{L~^)^YX,e'P‘x]e + Dyx{L~^)) 


Thus in fy^xW the \^x{L) \ factors cancel giving, fy^xW = In ^j^ ^(l)Sx (l-i 

This will reduce to fy^xW iff ^exi^) — Hex{L)K{L~^) where K{L~^) 
is all pass which occurs iff Dxy{L~^) = 0,Dyx(E~^) = 0,Dxx(-b~^) = 
Dyy{L~^) = where V'(L“^) is a scalar all-pass filter. Re¬ 

sults (a),(b),(c) now follow. 

We now give two examples. 

Example I. Differential delay. Suppose (^() = (^ i)(b*) $(T) = 

(q ^). So the two series are white noises that exhibit an instantaneous 
GC. The filtering delays one series relative to the other. Then we have, 

= (o IK Ktl) = (lp = H{L){1). And we see that ^(0) = / 

while H{L) is stable, causal and invertible, indeed H~^{L) = (_^p 'j’). Thus 
we see that the differential delay has introduced a spurious dynamic GC 
relation and the original purely instantaneous GC is lost. 

Example II. fMRI Hemodynamic Response is non-minimum phase. A 
number of stylized or ‘canonical’ HRFs based on the double gamma (i.e. 
difference of two gamma functions) have been presented in the literature 
e.g. mm- These stylized HRFs capture two essential features of empirical 
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Canonical Motor HRF 



Figure 1: Canonical Motor Cortex HRF and its (log) Roots 


HRFs; namely a slow rise to a peak followed by a small negative undershoot. 
And past practice has been to use one of them for all voxels in a slice or 
even volume. Here we illustrate with a motor cortex HRF from [16] 


h{t) = 

Tam np 


where {Ta,m) = (1.1,5) and {Tb,p) = (.9,12) while a = .4. Also we have 
scaled each term to have maximum value of 1. Here fa,fb are amplitudes 
to be found in a model fitting exercise. In Fig.l we show a plot of the HRF 
with /a = 1 = /fe and the zeros on a log scale. One zero has magnitude > 1 
showing the HRF is non-minimum phase. 


7 Downsampling and Forwards Granger Causality 

We now consider to what extent GC is preserved under downsampling. 

Using the sampled notation of our discussion above, and defining Zk = 
(f^), we have the following result: 

Theorem VII. Forwards Causality. 
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(a) If yt dn-sgc xt then dn-sgc Xk- 

(b) If yt dn-wgc xt then in general yk wgc Xk- 
Remarks. 

(i) Part (a) is new although technically a special case of a result of the 
author s established in a non SS framework. 

(ii) We might consider taking part(b) as a formalization of long standing 
folklore in econometrics mm that downsampling can destroy unidirec¬ 
tional Granger causality. However that same folklore is flawed because it 
failed to recognize the possibility of (a). The folklore is further flawed be¬ 
cause it failed to recognize the more serious reverse problem discussed below. 

Proof of (a). We use the partitioned expressions in the discussion leading 
up to result III. We also refer to the discussion leading up to Theorem VI. 

This allows us to write two decompositions. Firstly Wk = wx,k + WY,k 
where 

Wx,k = Bxex,km+i-l ,WY,k = 

From result III and the definition of dn-sgc 

'Wx,k is uncorrelated with wy^i for all k,l (7.1) 

The other decomposition is ) and 

ex,k is uncorrelated with €y,i for all k, I 

Next we note from Theorem IV that Cx{L~^I—A'^)~^ApBy 
0 for all p > 0. Thus we deduce 

CxWY,k = 0, for all k (7.3) 

We can now write 

Xk = Cx{L~^I - A^y^Wk + ex,k 

= Cx{L~^I - Ay~^wx,k + ^x,k (7.4) 

Vk = CYiL~^I — Ay~^Wx,k + CYiL~^I — Ay~^WY,k + yk 
Based on (17.4p we now introduce the ISS model for Xk 
Xk = Cx{L~^I - Ay~^K(^x)'‘^x,k + 

where irx,k is the innovations sequence. Using this we introduce the esti¬ 
mator K(^x)^x,k of wx,k and the estimation error wx,k = wx,k — K(^x)^x,k- 
Below we show 


(7.2) 

= Cx^TA^+'-^ByL'^ 


wx,k is uncorrelated with vx^i for all k, I (7-5) 
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We thus rewrite the model for yk as, 

Vk = Cy{L-^I -A^r^K^x)iyx,k + Ck 
Ck = Cy{L~^I - A^)~^[wY,k+Wx,k]+^Y,k 

Now we can construct an ISS model for = {I+Cy{L~^I—K(^Y))^Y, k 
where i'Y,k is the innovations sequence. In view of (|7.1l7.2l7.5p i'x,k and i'y,i 
are uncorrelated for all k,l. Thus we have constructed the joint ISS model 


'xk 


k'X,k 


= H{z) 


Vk. 

. k'Y,k _ 


. ^ I + CxiL-H-A^-^K^x) 0 

^ ^ [ Cy{L-^I - A^y^K^x) I + CY{L-^I-Ay-^K^Y) 

From this we deduce that yk dn-sgc Xk as required. 

Proof of (j7.5p . Consider then 

E{wx,ki^x,i) = Eiwx^k - K(^xyx,k)k'x^i) = E{wx,kk'x,i - K(x)E{vx,kk'x,i)- 
The second term vanishes for k ^ 1. The first term vanishes for k > I since 
wx,k is uncorrelated with the past and hence i'x,i] for ^ > A: it vanishes since 
ux^i is uncorrelated with the past. For k = I it vanishes by the definition of 

K(^x) [26]. 

Proof of (b). A perusal of the proof of (a) shows that we cannot construct 
the block lower triangular joint ISS model; in general we obtain a full block 
ISS model. 

8 Downsampling and Reverse Granger Causality 

We now come to the more serious issue of whether unidirectional Granger 
causality might arise from downsampling even though not present on the 
original timescale. To establish this we have simply to exhibit a numerical 
example but that is not as simple as one might hope. 

8.1 Simulation Design 

Designing a procedure to generate a wide class of examples of spurious 
causality is not as simple as one might hope. We develop such a proce¬ 
dure for a bivariate vector autoregression of order one; a bivariate VAR(l). 
On the one hand this is about the simplest example one can consider; on 
the other hand it is general enough to generate important behaviours. 
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The bivariate VAR(l) model is then, 


xt xt-i ex,t. 

yt yt-i eY,t' 


= 


-■ 4^x 'Jx ■ 

'Jy (t>y 


pcTaerb crl 


where S is the variance matrix of the zero mean white noise p is a 

' <^Y,t ' ’ ^ 

correlation. 

We note that this model can be written as an ISS model with parameters, 
A,—A, S. Hence all the computations desribed above are easily carried 
out. 

But the real issue is how to select the parameters. By a straightforward 
scaling argument it is easy to see the we may set fia = 1 = Ufe without loss 
of generality. Thus we need to choose only H, p. 

Some reflection shows that there are two issues. Firstly we must ensure 
the process is stationarity i.e. for the eigenvalues Ai,A 2 of A we must have 
I All < 1,|A2| < 1. Secondly to design a simulation we need to choose 
Fy^x, Fx^y] but these quantities depend on the parameters A, p in a 
highly nonlinear way so it is not obvious how to do this. And five parameters 
is already too many to pursue this by trial and error. 

For the first issue we have trace{A) = \i + \2 = (t>x F 4>y and det{A) = 
A 1 A 2 = 4>x4>y ~ Ixly Our approach is to select Ai, A 2 and then find (l)x,4>y 
to satisfy 4>x + 4>y = F ^ 2 , (f>x4>y = A 1 A 2 + 7x7y This requires solution of 
a quadratic equation. If we denote the solutions as r+,r_ then we get two 
cases: {4>x,4>y) = (r+,r_) and {4>y,4>x) = (r+,r_). This leaves us to select 


Ixi'Jy 

2 

In Appendix B we show that Fy^x = > ln{l F ^x) where ^x = 

7^(1 — p^). And similarly Fx^y > ln{l + ^y) where = 7^(1 — p^). But 
we also show that Cx = 0 ^ Fy^x = 0 and = 0 ^ Fx^y = 0. So we 
select thereby setting a lower bounds to Fy^x,Fx^Y- This seems 

to be the best one can do and as we see below works quite well. So given 

^X 7 ^y compute 7 a; = F--^S= and 'jy = F-^^=. This gives four cases and 

together with the two cases above yields eight cases. 

This is not quite the end of the story since the 7 a;, 7 y values need to be 
consistent with the (jix^^t’y values. Specifically the quadratic equation to be 
solved for (j)x,(py must have real roots. Thus the discriminant must be > 0. 

So (</>a; + </>y)^-4(0a;</>j/) = (Al + A 2 )^-4(Ai A 2 +7a;7y) = (Al - A 2 )^- 47 a; 7 y > 0. 

There are four cases; two with real roots, two with complex roots. 

If Ai,A 2 are real then we require (Ai — A 2 )^ > 47 a; 7 ^ ^ (Ai — A 2 )^ > 

4:sign{'jx'yy) '{^ ■ This always holds if signi'jxly) < 0. If sign{'yx7y) > 0 
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then we have a binding constraint which restricts the sizes of 

If Ai, A 2 are complex conjugates then (Ai—A 2 )^ is negative. If sign{'-^xly) > 
0 then the condition never holds. If sign{'yx7y) < 0 then there is a bind¬ 
ing constraint which restricts the sizes of In particular note that if 

sign{'^xly) = 0 then one cannot have complex roots for A. We now use this 
design procedure to illustrate reverse causality. 

8.2 Computation 

We describe the steps used to generate the results below. We assume the 
state space model for zt = (^‘) comes in ISS form. Since standard state 
space subspace model fitting algorithms [30],[S],[I] generate ISS models 
this is a reasonable assumption. Otherwise we use result I to generate the 
corresponding ISS model. 

Given a sampling multiple m we first use Theorem III to generate the 
subsampled ISS model and hence To obtain the GEMs we use The¬ 
orem I to generate the marginal models for xt , yt yielding . And 

now 3-^6 gotten from the formulae (I5.8p . (|5.9p and the comment 

following Theorem Va. 

8.3 Scenario Studies 

We now illustrate the various results above with some bivariate simulations. 
Example 1. GEMs decline gracefully. 

Table 1 . GEMs for various sampling intervals for 

(Ai, A2 ,?x,4>^') = X .l),-.95exp(-j x .1), 1.5, .2, .2) ^ A = 

/-.204 -1.24\ 

V .452 -1.69 h 


m 

1 

2 

3 

4 

5 

6 

10 

20 

30 

40 

Fy^x 

1.3761 

1.657 

1.408 

1.169 

0.994 

0.864 

0.551 

0.151 

0.001 

0.014 

Fx^y 

0.19834 

0.253 

0.287 

0.308 

0.319 

0.322 

0.293 

0.109 

0.001 

0.011 


Here, for the underlying process, y pushes x much harder than x pushes 
y. This pattern is roughly preserved with slower sampling, but the relative 
strengths change. 

Example 2. GEMs Reverse. 

Table 2 . GEMs for various sampling intervals for 

{Xi,X 2 ,^x,^yP) = {.95exp{jxA),.95exp{-jxA),1.5,.2,.2)^A = {\ll 
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m 

1 

2 

3 

4 

5 

6 

10 

20 

30 

40 

Fy^x 

0.92983 

0.879 

0.766 

0.683 

0.62 

0.57 

0.418 

0.131 

0.001 

0.013 

Fx^y 

1.0476 

1.824 

2.006 

1.795 

1.527 

1.3 

0.751 

0.18 

0.002 

0.016 


In this case the underlying processes push eachother with roughly equal 
strength. But subsampling yields a false picture with x pushing y much 
harder than the reverse. 

Example 3. Near Equal Strength Dynamics Becomes Nearly Unidirectional. 
Table 3 . OEMs for various sampling intervals for 
{Xi,X2,(x,(y,p) = (.995,-.7,1,.5, .7) ^ A = 


m 

1 

2 

3 

4 

5 

6 

10 

20 

30 

40 

Fy^x 

1.487 

0.051 

0.245 

0.057 

0.111 

0.052 

0.038 

0.019 

0.012 

0.009 

Fx^y 

1.685 

0.167 

0.638 

0.258 

0.467 

0.294 

0.289 

0.212 

0.159 

0.125 


In this case the underlying relation is one of near equal strength feedback 
interconnection. But almost immediately a very unequal relation appears 
under subsampling which soon decays to a near unidirectional relation. 
Example 4. Near Unidirectional Dynamics Becomes Near Equal Strength. 
Table 4 . OEMs for various sampling intervals for 

{Xi,X 2 ,(x,(y,p) = (.99exp(j X .25), .95exp(-j x .25), .1,3, -.8) A = 

/1.883 -0.408 N 
12.236 0.036 h 


m 

1 

2 

3 

4 

5 

6 

10 

20 

30 

40 

Fy^x 

0.023 

0.284 

0.381 

0.428 

0.451 

0.457 

0.309 

0.454 

0.407 

0.168 

Fx^y 

2.937 

2.384 

1.617 

1.243 

1.019 

0.859 

0.372 

0.532 

0.453 

0.178 


In this case a near unidirectional dynamic relation immediately becomes 
one of significant but unequal strengths and then one of near equal strength. 

There is nothing pathological about these examples and using the de¬ 
sign procedure developed above it is easy to generate other similar kinds 
of examples. They make it emphatically clear that GC cannot be reliably 
discerned from subsampled data. 

9 Conclusions 

This paper has given a theoretical and computational analysis of the use 
of Granger casuality in fMRI. There were two main issues: the effect of 
downsampling and the effect of hemodynamic convolution. To deal with 
these issues a number of novel results in multivariate time series and Granger 
causality were developed via state space methods as follows. 
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(a) Computations of submodels via the DARE (Theorems I,IV). 

(b) Reliable computation of OEMs via the DARE (Theorems Va,Vb). 

(c) Effect of filtering on OEMs (Theorem VI). In particular the destructive 
effect of the non-minimum phase property of HREs. 

(d) Computation of downsampled models via the DARE. 

Using these results we were able to develop, in section 8, a framework for 
generating downsampling induced spurious Granger causality ’on demand’ 
and provided a number of illustrations. 

All this leads to the conclusion that that Granger causality analysis of 
fMRI data cannot be used to discern neuronal level driving relationships 
. Not only is the time-scale too slow but even with faster sampling the 
non-minimum phase aspect of the HRF will still compromise the method. 

Future work would naturally include an extension of the Granger causal¬ 
ity results to handle the presence of a third vector time series. And also ex¬ 
tensions to deal with time-varying Granger causality. Non-Gaussian versions 
could mitigate the non-minimum phase problem to some extent but there 
does not seem to be any evidence for the non-Gaussianity of fMRI data. Ex¬ 
tensions to nonlinear Granger causality are currently of great interest but 
need a considerable development. 

10 Stabilizability, Detectability and DARE 

In this section we restate and modify for our purposes some standard state 
space results. We rely mostly on [26] [Appendices E,G]. 

We denote an eigenvalue of a matrix by A and a corresponding eigen¬ 
vector by q. We say A is a stable eigenvalue if |A| < 1; otherwise A is an 
unstable eigenvalue. 

10.1 Stabilizability 

The pair (A, B) is controllable if there exists a matrix G so that A — BG 
is stable i.e. all eigenvalues of A — BG are stable. {A,B) is controllable iff 
any of the following conditons hold, 

(i) Controllability matrix: C = [B, AB, • • •, A^~^B] has rank n. 

(ii) Rank Test: ranA:[A/ — A,B] = n for all eigenvalues A of A. 

(hi) PBH test: There is no left eigenvector of A that is orthogonal to B 
i.e. if q'^A = \q^ then q^B / 0. 
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The pair {A,B) is stabilizable if: rank[XI — A, B] = n for all unstable 
eigenvalues of A. Three useful tests for stabilizability are: 

(i) PBH Test: {A,B) is stabilizable iff there is no left eigenvector of 
A corresponding to an unstable eigenvalue that is orthogonal to B i.e. if 
q^A = Xq^ and |A| > 1 then q'^B / 0. 

(ii) {A,B) is stabilizable if {A,B) is controllable. 

(iii) {A,B) is stabilizable if A is stable. 

10.2 Theorem DARE 

[26] (Theorem E6.1, Lemma 14.2.1,section 14.7) 

Under conditions, N,St,De the DARE has a unique positive semi-dehnite 
solution P which is stabilizing, i.e. Ag — KgC is a stable matrix. Eurther if 
we initialize Pq = 0 then Pt is nondecreasing and Pt ^ P as t ^ oo. 
Remarks. 

(i) Ag — KsC stable means A—KC is stable (see below). And this implies 
that {A,K) is controllable (see below). 

(ii) Since V > R then N \s positive dehnite. 

Proof of (i). We first note (taking limits in) [26] (equation 9.5.12) Kg = 
K-SR-\ We have then Ag-KgC = A-SR-^C-{K-SR-^)C = A-KC. 
So Ag — KgC is stable iff A — KC is stable. But then {A, K) is controllable. 

10.3 Detectability 

The pair (A, C) is detectable if (A^, C'^) is stabilizable. 

Remarks. 

(i) If As is stable (all eigenvalues have modulus < 1) then S', D automat¬ 
ically hold. 

(ii) Condition D can be replaced with the detectability of (A, C) which 
is the way [26| states the result. We show equivalence below (this is also 
noted in a footnote in [26] Isection 14.7). 

Proof of Remark(ii). Suppose {Ag,C) is detectable but (A, C) is not. Then 
by the PBH test there is a right eigenvector p of A corresponding to an unsta¬ 
ble eigenvalue of A with Ap = Xp, Cp = 0. But then Agp = {A — SR~^C)p = 
Ap = Xp while Cp = 0 which contradicts the detectbility of [Ag^C). The 
reverse argument is much the same. 

Proof of Result I . (a) follows from the discussion leading to theorem 
DARE, (b) follows from the remarks after theorem DARE. 
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11 GEMs for Bivariate VAR(l) 


Applying formula (|5.5p and reading off Hxx etc. from the VAR(l) model 
yields, 

/x(A) = [(1 + - 2(i)yCos{\))al + -flal - 2paa(Th{lx(t)y - 7xCos(A))]/|R>(e^^)p 

D{L) = (1 - - (pyL) - -ixlyL^ 

This can clearly be written as an ARMA(2,1) spectrum cj^|l— 

Equating coefficients gives 

To = + ^l) = (1 + + ^Wx - ‘^PC^a(yhlx(l>y = ^a(l + + 4) 

Tl — ^x^x — ^a^x 


where ^x = {^ — p‘^)lx^b ~ ^y~ AV^e thus have = qf/cr^ and 

using this in the first equation gives, 70 = cr^ +7i/o'^ or — o'x7o + 7i = 0. 
This has, of course, two solutions 


a 


2 

X 






(y. 



2(1 + ix + dl. ± -\/(l + Cx + dlY — Adi) 


Note that if = 0 this delivers % = i(l + + -^(1 — d'^Y — 1- 

2 

We must choose the solution which ensures |dx|<l = ^<l=^< 

2 = 70/<7a < 2(7 ^/(Tq = 1 + ^x + < 1 + ^x + dx 7/(1 + ^x + d|)2 — 4d|. 

And so we must choose the + solution. Continuing, we now claim, 

> -(l + Cx + d^ + -\/(l + ^x — d^)^ = -(1 + ^x + d^ + l + ^x —d^) = l + ^x 

This follows if, (1 + ^x + d^)^ — 4d^ > (1 + ^x — d^)^ = (1 + ^x + d‘^Y “ (1 + 
^x — dx)^ > 4dx = 2(1 + ^x)2dx > 4dx = 1 + ^x > 1, which holds. 


12 Spectral Factorization 

Suppose et is a white noise sequence with E{et) = 0,uar(et) = S. Let 
G{L) be a stable causal possibly non-minimum phase filter. Then zt = 
G{L)et has spectrum fzW = G{L)VG'^{L~^) where L = exp{—jX). We 
can then find a unique causal, stable minimum phase spectral factorization. 
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fzW — Go{L)VoG'^{L ^). Let V,Vo have Cholesky factorizations, V = 
JJ^,Vo = JoJj and set Gc{L) = G(L)J, = Go{L)Jo. Then /^(A) = 

Gc{L)G'^{L~^) = Go^c{L)G'^,,{L~^). Since Go,c{L) is minimum phase we 
can introduce the causal filter E{L) = G~l{L)Gc{L) ^ E{L)E'^{L~^) = I. 
Such a filter is called an ^ pass filter mM- Now, G,{L) = GoAL)E{L) 
or G{L) = Go{L)JoE{L)J~^ i.e. a decomposition of a non-minimum phase 
(matrix) filter into a product of a minimum phase filter and an all pass filter. 
We can also write this as, Go{L) = G{L)JE~^{L)J~^ = G{L)JE^ 
showing how the non-minimum phase filter is transformed to yield a spectral 
factor. 


References 

[1] D. Bauer. Estimating linear dynamical systems using subspace. Econo¬ 
metric Theory, 21:181-211, 2005. 

[2] C. Bernasconi and P. Konig. On the directionality of cortical interac¬ 
tions studied by structural analysis of electrophysiological recordings. 
Biol. Cyb., 81:199-210, 1999. 

[3] P.E. Caines. Weak and strong feedback free processes. IEEE Trans. 
Autom. Contr., 21:737-739, 1976. 

[4] P.E. Caines and W.W. Chan. Eeedback between stationary stochastic 
processes. IEEE Trans. Autom. Contr., 20:498-508, 1975. 

[5] Y. Chen, S.L. Bressler, and M. Ding. Erequency domain decomposition 
of conditional granger causality and application to multivariate neural 
field potential data. Jl Neuroscience Methods’, 150:228-237, 2006. 

[6] L.J. Christiano and M. Eichenbaum. Temporal aggregation and struc¬ 
tural inference in macroeconomics. Technical Report -, Eed Reserve 
Bank Minnesota, 1987. 

[7] M. Deistler, K. Peternell, and W. Scherer. Consistency and relative 
efficiency of subspace methods. Automatica, 31:1865-1875, 1995. 

[8] G. Deshpande, K. Sathian, and X. Hu. Effect of hemodynamic vari¬ 
ability on granger causality analysis of fmri. Neuorimage, 52:884-896, 
2010. 


27 



[9] M. Ding, S.L. Dressier, W. Yang, and H Liang. Short-window spec¬ 
tral analysis of cortical event-related potentials by adaptive multivari¬ 
ate autoregressive modeling; data preprocessing, model validation, and 
variability assessment. Biol. Cyb., 83:35-45, 2000. 

[10] J.M. Dufour and E. Renault. Short and long run causality in time 
series; theory. Econometrica, 66:1099-1125, 1998. 

[11] J.M. Dufour and A. Taamouti. Short and long run causality measures: 
theory and inference. Journal of Econometrics, 154:42-58, 2010. 

[12] F. Edin. Scaling errors in measures of brain activity cause erroneous 
estimates of effective connectivity. Neuroimage, 49:6217630, 2010. 

[13] J. Geweke. Temporal aggregation in the multiple regression model. 
Econometrica, 46:643-661, 1978. 

[14] J. Geweke. The measurement of linear dependence and feedback be¬ 
tween multiple time series. Jl. Amer. Stat. Assoc., 77:304-313, 1982. 

[15] J. Geweke. Measures conditional of linear dependence and feedback 
between time series. Jl. Amer. Stat. Assoc., 79:907-915, 1984. 

[16] G.H. Glover. Deconvolution of impulse response in event-related bold 
fmri. Neuroimage, 9:416-429, 1999. 

[17] C.W.J. Granger. Economic processes involving feedback. Information 
and Control, 6:28-48, 1963. 

[18] C.W.J. Granger. Investigating causal relations by econometric models 
and cross-spectral methods. Econometrica, 37, 1969. 

[19] M Green. All pass matrices, the positive real lemma and unit canonical 
correlations between future and past. Jl Multiv Anal, 24:143-154, 1988. 

[20] J. Hamilton. Time Series Analysis. Princeton Univ. Press, Princeton, 
New Jersey, 1994. 

[21] D.A. Handwerker, J. Gonzalez-Castillo, M. D’Esposito, and P.A. Ban- 
dettini. The continuing challenge of understanding and modeling hemo¬ 
dynamic variation in fmri. Neuroimage, 62:1017-1023, 2012. 

[22] E.J. Hannan and M. Deistler. Statistical theory of Linear Systems. J. 
Wiley, New York, 1988. 


28 



[23] E. J. Hannan and D. Poskitt. Unit canonical correlations between future 
and past. Ann. Stat., 16:784-790, 1988. 

[24] R.N.A. Henson. Analysis of fmri time series; Linear time-invariant 
models, event-related fmri and optimal experimental design. In Human 
Brain Function, pages 793 - 822. Elsevier: London, 2003. 

[25] C. Hsiao. Time series modeling and causal ordering of Canadian money, 
income and interest rates. In Time Series Analysis: Theory and Prae- 
tice, North Holland, pages 671-699, 1982. 

[26] T. Kailath, A.H. Sayeed, and B. Hassibi. Linear Estimation. Prentice 
Hall, New York, 2000. 

[27] M.J. Kaminski and K.J. Blinowska. A new method of the description 
of the information flow in the brain structures. Biol. Cyb., 65:203-210, 
1991. 

[28] K. Kaufmann and D.I. Stern. Evidence for human influence on climate 
from hemispheric temperature relations. Nature, 388:39-44, 1997. 

[29] A.M. Kshirsagar. Multivariate Analysis. Marcel Dekker, New York, 
1972. 

[30] W.E. Larimore. System identification, reduced order filters and model¬ 
ing via canonical variate analysis. In Proc. American Control Confer¬ 
ence, pages 445-451. IEEE, 1983. 

[31] H. Lutkepohl. Introduetion to multivariate time series analysis. 
Springer Verlag, Berlin, 1993. 

[32] J.R. Magnus and H. Neudecker. Matrix Differential Caleulus with Ap¬ 
plications in Statistics and Econometrics. J. Wiley, New York, 1999. 

[33] A. Marcet. Temporal aggregation of economic time series, pages 237- 
281, 1985. 

[34] S.M. Pandit and S. Wu. Time Series and Systems Analysis with Ap- 
plieations. J. Wiley, New York, 1983. 

[35] D.A. Pierce and L.D. Haugh. Causality in temporal systems character¬ 
izations and a survey. Journal of Econometrics, 5:265-293, 1977. 

[36] J. Rissannen and M. Wax. Measures of mutual and causal dependence 
between two time series. IEEE Trans. Inf. Thy., 33:598-601, 1987. 


29 



[37] A. Roebroeck, E. Formisano, and R. Goebel. Mapping directed influ¬ 
ence over the brain usng granger causality. Neuroimage, 25:230-242, 
2005. 

[38] A. Roebroeck, E. Formisano, and R. Goebel. The identification of 
interacting networks in the brain using fmri: Model selection, causality 
and deconvolution. Neuroimage, 58:296-302, 2011. 

[39] G.A.F. Seber. Multivariate Observations. J. Wiley, Brisbane, 1984. 

[40] C.A. Sims. Discrete approximations to continuous time distributed lags 
in econometrics. Econometrica, 39:545-563, 1971. 

[41] C.A. Sims. Money, income and causality. Am. Ec. Rev., 62:540-552, 
1972. 

[42] V Solo. Topics in advanced time series analysis. In Lectures in Proba¬ 
bility and Statistics, pages 165-328. G del Pino and R Rebolledo , eds; 
Springer-Verlag, 1986. 

[43] L. Suna and M. Wang. Global warming and global dioxide emission: An 
empirical study. Jl. Environmental Management, 46(4):327-343, 1996. 

[44] L.G. Telser. Discrete samples and moving sums in stationary stochastic 
processes. Jl. Amer. Stat. Assoc., 62:484-499, 1967. 

[45] U. Triacca. Is granger causality analysis appropriate to investigate the 
relationship between atmospheric concentration of carbon dioxide and 
global surface air temperature? Theoretical and Applied Climatology, 
81:133-135, 2005. 

[46] P. van Overschee and B. de Moor. Subspace Identifieation for Linear 
Systems. Kluwer, Dordrecht, 1996. 

[47] O. Yamashita, N. Sadato, T. Okada, and T. Ozaki. Evaluating 
frequency-wise directed connectivity of bold signals applying relative 
power contribution with the linear multivariate time-series models. 
Neuroimage, 25:478-490, 2005. 


30 



